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Jj£© This invention relates to recombinant DNA com- 
prising vector-DNA and a DNA sequence corre- 
J^sponding with, or relates to, a salicylate-inducible 
JJJj promoter of a GRP gene of plants, such as tobacco 
plants. The invention also relates to microorganisms, 
O plant cells and plants transformed using the recom- 
binant DNA, to a process for introducing an inducible 
UJ property in plants and to a process for producing a 
polypeptide or protein, using plant cells and plants 
transformed using the recombinant DNA. 
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Recombinant DNA; transformed microorganisms, plant cells and plants; a process for introducing an 
inducible property in plants, and a process for producing a polypeptide or protein by means of plants 

or plant cells 



A. Field of the invention 

This invention is in the field of DNA recom- 
binant technology and is based on the identification 
of GRP (giycine-rich protein) genes occurring in 
plants with a salicylate-inducible promoter. More 
particularly the invention relates to the use of such 
a salicylate-inducible promoter. 



B. State of the art 

Plants are continuously subject to influences 
from their environment which may involve a threat. 
These influences may relate to such factors as 
temperature, light humidity, salt and injuries, but 
also attack by pathogens, such as viruses, fungi, 
bacteria, insects and the like. For its survival, the 
plant has available a broad range of defensive 
mechanisms which are activated when the plant is 
subject to the "stress" conditions referred to. This 
activation is generally accompanied by the induc- 
tion of the expression of specific plant genes. This 
induction is controlled by control elements often 
present in the promoter region upstream of the 
gene in question. A given stress factor may either 
activate a highly specific set of plant genes, or 
result in a broad response of many defence genes. 
Thus an increase in temperature for a short period 
of time leads to the expression of so-called "heat 
shock" (HS) protein genes; the plant is subse- 
quently resistant to temperatures to which untreat- 
ed plants are not resistant, A conserved sequence 
of about 14 basepairs occurring several times in 
the promoter region of HS protein genes has been 
found to be responsible for the induction of these 
genes (see e.g. Pelham and Bienz t 1982; Bienz, 
1985). 

Various light-inducible genes have meanwhile 
been cloned. When a sequence of several hun- 
dreds of basepairs located upstream of these 
genes is fused with a "reporter" gene, for example, 
the chloramphenicol-acetyl-transferase gene (CAT 
gene), this gene becomes light-inducible in trans- 
genic plants (see e.g. Kuhiemeier et at, 1987; 
Green et at, 1987; Stockhous et al. t 1987). In the 
promoter regions of a number of light-inducible 
genes, a common element of 9 basepairs can be 
distinguished, which is possibly involved in the light 
inducibility (Grob and Stuber, 1987). 

When plants are injured, either mechanically or 
from being eaten by insects, plant genes are ac- 
tivated, inter alia, which code for proteinase inhibi- 
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tors. These proteins, which are best characterized 
in tomato and potato, have virtually no effect on 
proteolytic enzymes of the plant but specifically 
inhibit digestive enzymes of animals, in particular 

5 those of insects. When a proteinase inhibitor gene 
of potato is induced by injury, it is found that, inter 
alia, base sequences are involved which are lo- 
cated downstream of the gene (Thornburg et al., 
1987). When a proteinase inhibitor gene is placed 

w under the control of a constitutive promoter (the 
CaMV-35S-promoter) and expressed in transgenic 
plants, the plant is found to have become highly 
resistant to insect damage (Hilder et al., 1987). 
To be able to defend itself against infection by 

T5 pathogens, the plant has a mechanism known by 
the name of "hypersensitive response". When, as 
a result of infection, this mechanism becomes ac- 
tivated, the plant cells infected die, and a lignin wall 
is formed around the centre of infection, which the 

20 pathogen is unable to pass. This means that infec- 
tion results in necrotic lesions at the centres of 
infection, but the other parts of the plant remain 
virtually free of pathogen. Pathogens not activating 
the hypersensitive response may spread through- 

25 out the .entire plant and become accumulated to 
high concentrations. 

It has been found that in the case of a necrotic 
infection the pathogen-free parts of the plant de- 
velop a resistance to a second infection by a broad 

30 range of pathogens, such as viruses, fungi and 
bacteria ("acquired resistance"), no matter what 
type of pathogen caused the first infection. Thus a 
necrotic virus infection leads to resistance to fungi 
and vice versa . Owing to the necrotic infection, a 

35 large number of genes are induced in the 
pathogen-free parts of the plant (for a survey, see: 
Van Loon, 1982; Van Loon, 1985; Collinge and 
Slusarenko, 1987; Bol and Van Kan, 1988; Bol, 
1988; Van Loon, 1988). It is supposed that pro- 

40 ducts coded for by the induced genes play a role 
in the acquired resistance. Part of the induced 
genes code for enzymes which, starting from the 
amino acid phenylalanine, synthesize a diversity of 
aromatic compounds. These include compounds 

45 inhibiting the growth of fungi and called phytoaiex- 
ins. They also include precursors of the lignin used 
in reinforcing cell walls and forming a barrier ar- 
ound a centre of infection. Another part of the 
induced genes codes for hydroxyproline-rich 

so glycoproteins (HRGP, extensin) which are incor- 
porated in the cell wall and function as a matrix for 
attaching aromatic compounds, such as lignin. A 
third group of induced genes, finally, codes for 

2 
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proteins which accumul^Pi the vacuole in the 
plant ceil or are excreted in the intercellular space 
of the leaf. These so-called PR proteins 
(pathogenesis-related proteins) are best character- 
ized in tobacco but occur in the plant kingdom in a 
highly conserved form. For one part they turn out 
to be hydrolytic enzymes, such as chitinases and 
glucanases, which in combination efficiently inhibit 
the growth of fungi on artificial media . Another PR 
protein is thought to inhibit the digestive enzymes 
of insects. The function of the other PR proteins is 
unknown. 

White (1979) has found that the treatment of 
tobacco with certain aromatic compounds, such as 
salicylic acid (in the neutralized form) leads to the 
induction of a subgroup of PR proteins, i.e. the PR- 
1 proteins, and to a resistance to virus infection. 
This was seen as an indication that this subgroup 
of PR proteins is involved in the induced resistance 
to virus infection. Fraser (1983) argued against this 
that there are conditions which induce PR-1 pro- 
teins but do not generate an antiviral response. 
Hooft van Huijsduijnen et al. (1986) cloned DNA 
copies of six classes of messenger-RNA (mRNAs) 
which in Samsun NN tobacco are induced by to- 
bacco mosaic virus (TMV) infection. Two of these 
classes of mRNAs are also induced by salicylate. 
One of these turned out to correspond to the PR-1 
proteins. The other does not correspond to known 
PR proteins and was initially called "cluster C". 
Meanwhile the name has been changed into GRP- 
mRNA by reason of the discovery that it codes for 
a glycine-rich protein. This last suggests that the 
protein could be a cell wall component, comparable 
in function to the HGRP (Varner and Cassab, 
1986). The copy DNAs (cDNAs) of the PR-1 
mRNAa have been used as a probe for isolating 
clones of PR-1 genes with a genomic library of 
tobacco; the base sequence of these has been 
clarified (Cornelissen et al., 1987). 



C. Description of the invention 

By hybridizing GRP-cDNA with a Southern blot 
of DNA of Nicotiana tabacum cv Samsun NN, it 
was found that the genome of tobacco contains 
about eight GRP genes. From a genomic library of 
Nicotiana tabacum cv. Samsun NN, four GRP 
genes were cioned. The base sequence of two of 
the cloned GRP genes was clarified. They were 
found to consist both of two exons coding for a 
protein of 109 amino acids. After splitting off a 
putative signal peptide, the mature protein consists 
as to about 25% of glycine and as to about 30% of 
charged amino acids. By S1 -nuclease mapping, it 
was found that one of the two genes analysed is 
expressed. The sequence of this gene (clone 



gGRP-8) H^Pne flanking DNA regions is given in 
Fig. 1 . The other gene is probably not expressed in 
response to virus infection. From this, and from an 
analysis of the base sequence of cloned GRP- 

5 cDNAs, it can be concluded that at least three of 
the eight GRP genes are expressed after virus 
infection. The data obtained indicate that there is 
more than 80% homology between the coding se- 
quences of the various GRP genes and also be- 

70 tween the upstream DNA regions. 

Fragments of the promoter region of the GRP 
gene in clone gGRP-8 were fused with the CAT- 
reporter gene. By means of the Agrobacterium 
tumefaciens technology, these constructs were in- 

15 tegrated into the genome of tobacco, and the trans- 
genic plants were tested for inducibiiity of the CAT 
gene by salicylate. In a reproducible manner, it was 
found that the first 114 nucleotides upstream of the 
transcription initiation site contain one or more eie- 

20 ments which cause the promoter to become induc- 
ible by salicylate. This promoter was found to be 
also induced by several other substances, includ- 
ing acrylic acid, ethylene, and ethephone. Between 
the nucleotides -400 and -645 of Fig. 1, there are 

25 one or more elements which greatly enhance the 
salicylate-inducible activity of the promoter, if, 
therefore, a DNA fragment carrying the sequence 
of nucleotide -645 to +8 is coupled to any given 
gene, then, after transformation of plants with this 

30 construct, it will be possible for the gene in ques- 
tion to be induced with salicylate and several other 
specific aromatics in a controlled manner. At this 
moment, no other plant promoters have been char- 
acterized which can be regulated with a chemical 

35 effector in such a simple manner. 



D. Further elaboration of the invention 

40 The invention provides broadly recombinant 

DNA comprising vector-DNA and a DNA sequence 
corresponding to, or related to, a salicylate-induc- 
ible promoter of a GRP gene of plants. 

The vector-DNA portion of the recombinant 

45 DNA according to the invention is not critical per 
se, and is determined by the contemplated use of 
the recombinant DNA, in particular the host to be 
transformed. Those skilled in the art know what 
vectors are suitable for given hosts. Known vectors 

so which can be used in the Agrobacterium 
tumefaciens technology for the transformation of 
plants and plant cells are, for example, pAGSl27 
(van den Elzen et al, 1985) en pROK1 (Baulcombe 
et al, 1986). Known vectors which can be used for 

55 cloning in bacteria, such as Escherichia coii are, for 
example, the various pUC plasmids. 

r As well known to those skilled in the art, the 
vector DNA will commonly, in addition to an origin 
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of replication that is suitable to the host, also con- 
tain one or more marker genes, e.g., certain 
antibiotic-resistant genes, in order that transformed 
hosts may be selected with facility. 

The novel and inventive element in the recom- 
binant DNA according to this invention consists in 
the DNA sequence which corresponds with or is 
related to, a salicylate-inducible promoter of a GRP 
gene of plants. Fig. 1 illustrates one concrete ex- 
ample of such a GRP gene, comprising a structural 
GRP gene and flanking regulation sequences. In 
nature, however, variants occur which are com- 
prised by the present invention as far as they 
contain a salicylate-inducible promoter. The same 
applies to artificially constructed variants not dem- 
onstrated to be naturally occurring: these too are 
comprised by the present invention, provided they 
contain a salicylate inducible promoter. Of the 
flanking DNA sequences, only certain portions are 
responsible for the promoter function, the inducibil- 
ity of the promoter by salicylate, and the strength 
of either the promoter or its inducibility by salicy- 
late. Particularly in the other portions of the flanking 
regions, considerable variations are permissible. As 
regards the nucleotide sequence of the possible 
structural gene placed under the control of the 
promoter sequence, changes which do not affect 
the eventual sequence of amino acids will often be 
permissible. Changes leading to minor deviations in 
the sequence of amino acids will in many cases be 
still without consequences for the expression and 
function of the protein. The place, length and 
nucleotide sequence of introns can generally be 
varied as well, provided they can be processed by 
the host. 

It should be noted that the term "GRP gene", 
as used herein, means not only the DNA coding for 
GRP, but, in a broader sense, the DNA involved in 
the expression of GRP, including the DNA coding 
for GRP (designated herein as structural GRP 
gene) and flanking DNA regions with regulating 
functions, including the GRP promoter. 

Preferred embodiments of the invention de- 
scribed herein consist in the use of the GRP pro- 
moter for the following purposes. 

1 . Controlled production of commercially interesting 
proteins in plants ~" 

For the production through recombinant DNA 
techniques of proteins that have to undergo a post- 
translational modification, e.g., glycosylation, it is 
recommendable to use eukariotic organisms. It is 
to be expected that, for this production, in addition 
to yeast and animal cells, plants can be used in 
future. By means of the GRP promoter, the produc- 
tion of the desired protein can be switched on at a 
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controlled point of time by spraying or watering the 
plants with a solution containing millimolar quan- 
tities of sodium salicylate. This is in particular of 
importance when the protein to be produced is 

5 toxic to the plant or, for example, owing to a one- 
sided amino acid composition, forms a burden for 
the plant's metabolism. The salicylate can also be 
supplied through the ground water, when a local 
effect only is considered undesirable. In addition, in 

io that case a separate step for rinsing off the salicy- 
late, which when dried may induce necrosis on the 
leaves, can be done without When the GRP pro- 
moter or derivatives thereof are fused with the 
code for a suitable signal peptide, there is the 

T5 possibility of causing the desired protein to be 
secreted by the plant in the intercellular space of 
the leaf, from which it can be isolated in a simple 
manner in relatively pure form. 

20 

2- Controlled expression of genes in plants 

Another possibility is the expression of genes 
to be controlled from the outside, with the object of 

25 controlling certain processes in the plant which, for 
example, are of interest for agricultural use. Thus 
genes involved in disease resistance could be ex- 
pressed in a controlled manner. Also, this pro- 
moter, in combination with suitable genes involved 

30 in disease resistance, will react both rapidly and 
with great effectiveness in response to infection by 
a large group of pathogens, resulting in a more 
effective resistance reaction. This is the case, be- 
cause the original GRP gene, for example after 

35 infection of tobacco with TMV, is one of the fastest 
and most efficiently reacting genes. The genes in 
question, controlled by the GRP promoter, may 
originate from the plant itself, or have been intro- 
duced from the outside and originate either from 

40 other plants or from other organisms (after being 
rendered suitable for expression and functioning of 
the gene product in the plant). 



45 3. The controlled production of commercially inter- 
esting proteins in plant cell cultures 

Various biotechnologically oriented firms and 
institutions are at present investigating the possibil- 

so ity of utilizing large-scale cultures of genetically 
engineered plant cells for the production of pro- 
teins or secondary metabolites. In principle, there 
is the possibility of bringing the expression of an 
economically interesting gene under the control of 

55 the GRP promoter or derivatives thereof. Through 
standard techniques, cell cultures or root cultures 
can be obtained from plant material transformed by 
the Agrobacterium tumefaciens technology with the 
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promoter/gene fusion cdrwlroct in question. In such 
cell cultures, the gene concerned can be induced 
at the desired moment by adding sodium salicylate 
to the culture medium in millimolar quantities. 



E. Examples 



I. Cloning of GRP-cDNA 

Polyadenylated RNA was isolated from tobacco 
mosaic virus infected tobacco and enriched 
through gradient centrifugation in molecules of 650 
nucleotides (Hoofd van Huijsduijnen et al., 1986). 
Using standard techniques, well known to research- 
ers in this field, the RNA could be copied by 
means of an oligo (dT) primer, reverse transcrip- 
tase and desoxyribonucleotide triphosphates in 
minus-strand DNA. Subsequently, using RNase H 
and DNA polymerase, a complementary DNA chain 
was synthesized on this DNA by the method of 
Gubler and Hoffman (1983). The double-stranded 
DNA was provided with C tails, which were 
hybridized with G tails, formed on the plasmid 
pUC9 after this had been cleaved with Pstl 
(Maniatis et al., 1982). This construct was used for 
the transformation of E. coH MH-1. The transfor- 
mants were striped in duplicate on nitrocellulose 
filters. One filter was hybridized with cDNA of poly- 
(A)-RNA from TMV-infected tobacco, transcribed in 
vitro, the other filter was hybridized with cDNA 
against poly(A)-RNA from healthy tobacco 
(Maniatis et al., 1982). Transformants hybridizing 
better with the first probe than with the second 
contained cDNA of mRNAs induced by TMV infec- 
tion. From these transformants, plasmid was iso- 
lated, the insert was subcloned in M13 vectors and 
the sequence of the insert was determined by the 
method of Sanger et al. (1977). Clones with se- 
quences homologous to the sequence of 
nucleotides given in Fig. 1 contain the GRP-cDNA. 
As an alternative to the differential hybridization 
method, the cDNA library can be searched with a 
probe consisting of a desoxyoiigonucleotide syn- 
thesized on the ground of the sequence of the 
GRP exons given in Fig. 1. 



II. Cloning of GRP genes 

DNA isolated from the nuclei of Samsun NN 
tobacco was partially digested with Sau3 A I and 
cloned in the vector Charon 35 (for references, see: 
Cornelissen et al., 1987). The genomic library was 
searched with the plaque hybridization technique of 
Benton and Davis (1977), using the cDNA isolated 
in Example i as a probe. The insert in positively 



hybridizinr^Pages contained the GRP gene and 
could be subcloned in pUC9 plasmids in parts 
through standard techniques. 

5 

II I. Determination of GRP promoter activity 

The construction of GRP promoter/CAT gene 
fusions is illustrated diagrammatically in Fig. 2, A 

10 Hindlll fragment of gGRP-8 containing the se- 
quence of nucleotides -645 to + 155^ was sub- 
cloned. From position +155, deletions were made 
with Ba131, whereafter the ends were provided 
with Clal linkers by standard techniques. Hindlll- 

15 Clal fragments were subcloned in the vector pUCC 
arid characterized by means of sequence analysis. 
One deletion mutant (pDEL + 8) turned out to con- 
tain the sequence of from -645 to + 8 and accord- 
ingly lacks the ATG initiation codon of the GRP 

20 gene. 

The polyadenylation signal of the nopaline-syn- 
thase gene (Tnos) was isolated from plasmid 
pDH52 (Van Dun et al., 1987) as a 2 kb BamHI 
fragment and cloned in plC19H (Marsh et al.; 

25 1984), which yielded plC19H-Tnos. From this plas- 
mid, Tnos can be cut as a 260 bp EcoRl fragment 
and subcloned in pUC8, which produced pUC8- 
Tnos. This 260 bp Tnos fragment was also sub- 
cloned in pDEL + 8 downstream of the GRP pro- 

30 moter. Finally, the CAT gene of transposon Tn9 
(Alton and Vapnek, 1979) was isolated as a 773 bp 
Taql fragment from the pCaMV-CAT plasmid 
(Fromm et al., 1985) and cloned in the Clal site of 
construct" "pPR645, which produced plasmid 

35 pPRC645. 

Fragments of pDEL + 8 were subcloned in 
pUC8-Tnos as blunt Clal fragments. Subsequently, 
the 733 bp Taq l fragment was integrated in these 
constructs wltfTthe CAT gene. The promoter frag- 

40 ments of pDEL + 8 fused with the CAT gene by this 
route were cut at the 3' site with Clal at position 
+ 8 and at the 5 site with the enzymes Eco RV (at 
position -400), Hae lll at position -135) or Ava il (at 
position -114), respectively. The corresponding 

45 plasmids were called pPRC400, pPRC135 and 
pPRC114. These three plasmids and the pPRC645 
plasmid were linearized with Hindlll and cloned in 
the Hindlll site of the binary transformation vector 
pAGS127. The CaMVCAT plasmid was cloned as 

so an Xbal fragment in pAGS127. The resulting con- 
structs" were transferred to Agrobacterium 
tumefaciens , strain LBA4404 (Ooms et al., 1982), 
and the transconjugants were used to transform 
Samsun NN with the leaf disc method by standard 

55 procedures. Transgenic plants regenerated from 
shootlets, were tested by punching discs from the 
leaf and causing these to float on on water or a 
solution of 1 mM salicylic acid for 24 h. Protein 
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extracts of these discs were tested for CAT activity 
according to Gorman et al (1982). Fig. 3 shows the 
results. In lanes 1 and 2, 400 ul protein was used, 
in the other lanes 100 al. In lanes W and S, protein 
was used from discs floated respectively on water 
and salicylic acid. In lane 3, protein was used 
which had been isolated immediately after punch- 
ing leaf discs. Lanes 6 up to and including 1 1 show 
that GRP promoter sequences of 400, 135 and 1 14 
bp give the same degree of salicylic acid inducible 
CAT activity. Although relatively low, this activity is 
significant, as can be seen after magnifying the 
signal in lanes 1 and 2. The construct with the 645 
bp promoter region gives a much higher activity. 
The CAT activity in the leaf discs floated on water 
(lane 4) has probably been induced through 
wounding the leaf during punching. Here again, the 
CAT activity is considerably stimulated by salicylic 
acid. The conclusion can be drawn that elements 
responsible for the salicylic acid inducibility are 
present in the region between nucleotides -114 and 
+ 8, while one or more enhancer elements are 
present between nucleotides -645 and -400. 



— Descr 'PtiQ n of the drawings 

Fig. 1 shows the sequence of nucleotides of 
the structural GRP gene and the flanking DNA 
regions in gGRP-8 clone. The GRP reading frame 
has been aligned with the corresponding sequence 
of amino acids. The bold vertical arrow after the 
first 26 amino acids indicates the putative splitting 
site of a signal peptide. Initiation and termination 
signals involved in the transcription and translation 
procedures are underlined. The place of direct re- 
peats with lengths of 17 and 18 basepairs and of 
inverted repeats with lengths of 9 and 64 basepairs 
are indicated by horizontal arrows. Putative activa- 
tor elements are boxed or indicated by a waveline. 

Fig. 2 diagrammatically shows the construc- 
tion of GRP-promoter/CAT gene fusions. In it, use 
has been made of the known pDH52 plasmid which 
contains the polyadenylation signal of the nopaline- 
synthase gene (Tnos), of the known plCl9H and 
pUC8 plasmids. of the known pCaMV-CAT plasmid 
which contains the CAT gene of transposon Tn9, 
and of a plasmid pDEL + 8, the construction of 
which is described herein. 

Fig. 3 shows the results of tests in which the 
CAT activity was assayed in protein extracts of leaf 
discs of transgenic plants. The CAT activity was 
determined by the method of Gorman et al (1 982). 
For further description, see the text. 
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Claims 



1. Recombinant DNA comprising vector-DNA 

and a DNA sequence corresponding with, or re- is 
lated to, a salicylate-inducible promoter of a GRP 
gene of plants. 

2. Recombinant DNA as claimed in claim 1, 
comprising vector-DNA and a salicylate-inducible 
promoter of a GRP gene of tobacco. 20 

3. Recombinant DNA as claimed in claim 1, 
comprising vector-DNA and a salicylate-inducible 
promoter of a GRP gene of Nicotiana tabacum cv. 
Samsun NN. 

4. Recombinant DNA as claimed in claim 1, 25 
comprising vector-DNA and the DNA sequence of 
nucleotide -645 to nucleotide +8 of the GRP gene 

in clone gGRP-8, or a variant or portion thereof 
having a salicylate-inducible promoter activity. 

5. Recombinant DNA as claimed in any of 30 
claims 1-4 P comprising a structural gene different 
from the structural GRP gene under the control of 

the GRP promoter. 

6. Microorganisms, plant cells and plants trans- 
formed using recombinant DNA as claimed in any as 
of claims 1-5, and progeny thereof which still con- 
tain the promoter sequence introduced. 

7. A process for producing a polypeptide or 
protein by culturing plants or plant cells capable of 
synthesizing the desired polypeptide or protein, 40 
and isolating the polypeptide or protein produced, 
which comprises using plants which have been 
transformed, using recombinant DNA as claimed in 
claim 5 with a GRP-promoter-controlied structural 
gene therein which codes for the desired polypep- 45 
tide or protein, and inducing the production of the 
polypeptide or protein by contacting the plants or 
plant cells with salicylate or another agent inducing 

the GRP promoter. 



55 
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FI6.1A 



TTTAAAAATTATATTATT6AATTGATATAAATATCGGATACGGACA6GMATGAACAGCC 
-1780 -1760 -1740 

TTTCAACTGATAAGGGACTGTTTGACATCTTGTGCTGCATTATCTTTTTCTTCATTCGTG 
.1720 -1700 -1680 

TTTTAATTTGTAGGCCAGCAACCCCTTGCACGTGGTTTGACTCTTCCGATTCTCTCTCAA 
-1660 -1640 -1620 

ATACTTGTTCTTAAATTAAAAATATGTATAAAATArCAAAAATACTTTTTATGACGTAAG 
-1600 -1580 -1560 

CTATTTTCTACGTATCAATTTAGACGACGTAATTTGGTTTAACACAAAATTTATGAAAAA 
-1540 -1520 -1500 

ATAAAGACCTTTAAATATATAGACTTAAAAGCTTTGTGGGATATTTGCGTACGTATAAAA 
-1480 -1460 -1440 

GTTTTTCATTAAAATAAAGTGAGTAAAATGAAAAGTTTAAAGTTAAATTATTTTTAAATA 
-1420 -1400 -1380 

TAAAAATATTTTATTCTGGAACGGATTAATAAAAATGTGTTATTTAATTACGAGAGTATG 
-1360 -1340 -1320 

TCATAT ATATATATATAT ATATATATATATATATAfATATAGTCCTTAATAAGGAATCCA 
-1300 -1280 -1260 

TTAGTAGATCAGGTTAT7AATTTCTTTGTTTTTTTTTTTTTTGGTTTTAAGCGACTACTT 
-1240 -1220 -1200 

T AT AT T AG A AT T AAA A A T G T T T T GC AGGGAGT GGTTGCTCAT AGGC AGO AT T AC A AAAGG 
-1180 -1160 -1140 

TACTATGTAGAGCATAACCTACACTGGGATGCCTAGCTACACTAGTTGTACTGTTAGATG 
-1120 -UOO -1080 

GAGGCGTAGCAATACTATTTAACATTGGTACATCAAAAATATTAATACrACTGCTACTAC 
-1060 -1040 -1020 

AGACATTACTAGAAGATGGCTTATCCGAAGGTTGACAAAATTTGTTCATGTGTGTACGCC 
-1000 -980 -960 

AGGCCmGCATTGAGATGTT TACT TGC TGATCCTGGAGGAGATGTTTGAGGATGAAAGG 
-940 -920 -900 

TGGAGGGTTGCTCAAAAAAGTGATGTTGCrcCATTCTTTGGAGTTAGACTGTGAAAATAT 
-860 -860 -840 

TTTCTTTGTITGACAATTAATCTTGACCTGGATTACTTGCTTTTTACTATAAAAAAATTA 
-820 -800 -780 

AAT T f AAAT T r AT GC T T TGAGAAT AAGCGTAAGT TCAAC TC T T T AAG AGAGG T GGAGCGA 
-760 -740 -720 



GGATTTAAAATTTACGGGTTTGAGATTCTACTCCTTTTAAGTTATGAGAGATATTTTfAG 
-700 -680 -660 
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FIQ.1B 



TAAGCT TTTTATAAAATAAATATAGAATTTGAACAAAAACTACTACATTCAAACGCATCA 
-640 -620 -600 



ATAACCTAAACTCTACTTCTCCTCTAGTTCAAGACTCrCTTCA |rGTGGAAA} rGACATTAG 
-580 -560 ^54^ g 4 

GTAGCCATTTTAAACATGrTGTTrAAAATArArTCACAGTTTACAATGTATTIAAAGATr 
-520 -500 ft -480 n 



AGCAAT T TCGCTCAAACTTCAGGACATGGCGTCCTAGAGTTTAAACCTCAAAGT TTAAAC 
-460 -440 -420 



TTCAAGATATCGTATCCTAAAGTTCGAAAATTGTGTGTCCAGAAGTT TATGTCCTAAAT f 
17 -400 ig -380 17 -360 



T T AAAT TAATAGTTAAAAAATTCATGACACTTAATCCTAAATTTCAAATTACCATCrCAA 
-340 1Q -320 -300 g 4 

AAAT T CATGAC AC T T AGTCCAGAAT T TTGGATGAAT TAGC TCATCTTT T T ACACAT T AT A 
-280 -260 -240 



AAT T G T AAA r AT A T T T T AAAT AGCGAGC T T AAAAG T GAC T A T TGC TGC AC T TGG T C AGAC 
-220 -200 -180 

TTCACGTTCACTCTCTTTACTGCCACTTGTAGGCCGGGTTTCTTCGTGTCTTTGGTCCAC 

-160 -140 -120 ' 

CAAT 

AQAAJAA TQTACATTTTCCCTCATACCT CCAAGT AGTACCATTCCCTTCAATTATTTATG 

TATA °i P 

CATTCAAATCATAC TATAAAG AGAACCCAAGAGTACATCAGTTTCTTCATCCCTTAATTT 
-40 — -20 1 

M G S K A F I 
CATAAGCATCATAACTAAACTTTGAACAAAAAAAGAAAACATGGGTTCTAAGGCATTTCT 
20 40 60 

FlGljCLAFFFllSSEVVAGE 
GTTTCTTGGCCTTTGTTTGGCTTTTTTTTTCCTGATAAGCICTGAGGTTGTAGCTGGGGA 
80 100 120 

L a e t s N pi ^ intron 

ATTGGCTGAGACTTCCAACCGTAAGCTTACTCTCATTTTACTATGAAAAAATGAAAATCr 
140 *160 180 

CTTCTCTCATTATTTGATATAGGATTCAACTAATAATTATTTTGTATGCATTGAGTATTT 
200 220 240 

TAACrGTTGTAACATTCTTTAACCTTTCAAATTAGTGTTTATCAGCTAGCAAAGCTCAAT 
260 280 300 

TTAGTTTCCACATCGAGCTAGTAGTTGAGTTACATTACTATCGCTATAGCTTGATAATAA 
320 340 360 
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FIG.1C 



CTCTTAATATGrAGTCCTTTTArTTCATTTrAAGTGTTTTAATTrGGATGGATArGAAGT 
380 400 420 

TTAAATGAGAATGTMGTAAAATCTTTGAATCTTGTGATTTTArAAAGTTGTATAAAAAC 
440 460 480 

ATACCAAAAAATATCCTTTAAATCTTGTGGTCTTAAACATGTCTTGTATAAGAAGAGCCA 
500 520 540 

T AA AGGG T AAA AA TG AG AA TGG T GG A AC T T AAA AC C T AC T T A T T G A T T AA A T A T A G A AAG 
560 580 600 

AGT A F T T F TC T TAAAAAATAAT AAAAGGAAAGAACGA T ACATAAA T TGAAACA I ATGAAG 
020 640 660 



I MKLOGENG 
TACT ATGTATGTTTTAAnTTCATAATTGGTGCAGCAATGAAAf TGGAIGGCGAGAATGG 
680 700 720 

VOVQGRGGYNDYGGQGYYGG 
AGTAGACGTTGACGGACGTGGAGGATACAATGACGTTGGCGGCGATGGATATT ATGGTGG 
740 760 730 

GRGRGGGGYK R R G C R YGCC R 
TGGTCGCGGCCGTGGTGGTGGTGGTTATAAACGTAGAGGATGCCGCTATGGTTGCrGCAG 
800 820 840 

KGYNGCKRCCSYAGEAMOKY 
GAAAGGTTACAATGGTTGCAAAAGGrGriGTTCCTACGCAGGTGAGGCCATGGATAAAGT 
360 880 900 

TEAQPHH* 
CACTGAAGCTCAGCCTCACAACTGATCATTATGTGTAATArArAAAGAGTTTAAGTTAfA 
920 *^40 060 

TAfGTCGTTAGTATATGTAACTTATACGTTGTGACAAGATGTAATAATCrTGCTACTTTA 
980 1000 1020 

GACCTTGCTTGTAACAAGTAT GAATAAA GCCATTCGGTTCTTArGGATGGT TGGTCAf ST 

1040 luoo end of cONA 

^ 

AATGTTTTGTTGrACAATATTTTGTGACAATATGTTTCCATATTGTTfATTTTCTTCATA 
1100 1120 1140 

TTTTAGAGTAAAGGGTTTTCTTT7ATTTTATGAATCCGACAATTTTCTTTTAATTTCATC 
1160 1180 1200 

CGCGAATTTACAATTCAAGAAGAGATGGAGATCCAATACAACTAACGGGTTCTGGTTGAA 
1220 1240 1260 

TTC 
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FIQ.3 



1 2 3 4 5 6 7 8 9 10 11 12 13 
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w s 

135 



— cm 
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